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Abstract 

This paper presents an analysis of spinal codes, a class of rateless codes proposed recently [T7j . 
We prove that spinal codes achieve Shannon capacity for the binary symmetric channel (BSC) and 
the additive white Gaussian noise (AWGN) channel with an efficient polynomial-time encoder and 
decoder. They are the first rateless codes with proofs of these properties for BSC and AWGN. 

The key idea in the spinal code is the sequential application of a hash function over the message 
bits. The sequential structure of the code turns out to be crucial for efficient decoding. Moreover, 
counter to the wisdom of having an expander structure in good codes [21] , we show that the spinal 
code, despite its sequential structure, achieves capacity. The pseudo-randomness provided by a hash 
function suffices for this purpose. 

Our proof introduces a variant of Gallager's result characterizing the error exponent of random 
codes for any memoryless channel [TUl Chapters 5, 7]. We present a novel application of these error- 
exponent results within the framework of an efficient sequential code. The application of a hash 
function over the message bits provides a methodical and effective way to de-randomize Shannon's 
random codebook construction [19] . 



*HB, PI, and JP are affiliated with CSAIL; DS is affiliated with LIDS. 



1 Introduction 



In a rateless code, the codewords (i.e., coded bits or symbols) corresponding to higher-rate encodings 
are prefixes of lower-rate encodings. Rateless codes have been known since Shannon's random codebook 
construction [19], which proved the existence of capacity-achieving codes. Unfortunately, the random 
codebook is computationally intractable to decode, taking time exponential in the message size. It 
took several decades of research on coding theory and algorithms before practical rateless codes were 
discovered for the binary erasure channel (BEC) by Luby (LT codes [13]) and Shokrollahi (Raptor 
codes [20J). The BEC is a good model for packet losses on the Internet. 

For wireless channels, however, packet erasure models give way to more appropriate random bit-flip 
models (at the link layer) and additive noise models (at the physical layer). Moreover, wireless channel 
conditions vary with time due to mobility and interference, even over durations as short as a single packet 
transmission. In this setting, fixed-rate (or fixed-length) codes that work well at a fixed (and known) bit- 
flip probability or signal-to-noise ratio (SNR) are by themselves insufficient to achieve high throughput; 
they require additional (and complex) heuristics to determine what the channel conditions are, and to 
pick the right code [21 O [131 122], resulting in a system without any theoretically appealing properties. 
This task becomes difficult with rapid channel variations, numerous transmission rate alternatives, and 
multiple transmitters contending for the same wireless channel. 

In contrast to fixed-rate codes, a good rateless code will adapt automatically to changing conditions 
because it will inherently transmit just the right amount, whatever the conditions. Because they are a 
natural fit for time- varying wireless networks, the design of good rateless codes for the binary symmetric 
channel (BSC) and the additive white Gaussian noise (AWGN) channel has received renewed interest 
recently [61 [TTJ, [T7] . By "good" , we mean a code that achieves a rate close to channel capacity: 1 — H(p) 
for the BSC, where p is the bit-flip probability and H(p) = — plogp— (1— p) log(l— p), and idog(l+SNR) 
for the AWGN channel, where SNR is the ratio of the signal power to the noise variance [J 

In this paper, we prove that a family of rateless codes, called spinal codes, achieves capacity over 
both the BSC and the AWGN channel. Spinal codes are the first provably capacity-achieving rateless 
codes with a polynomial-time encoder and decoder over both these standard channel models. Our work 
provides for the BSC and AWGN channel what LT [TJ] and Raptor |20| codes provide for the BEC, but 
with a rather different approach. 

Spinal codes use hash functions satisfying the pair-wise indepence [15] to produce a sufficiently 
random codebook. The encoder for a spinal code applies the hash function sequentially over groups 
of message bits in a structure that resembles a classic convolutional code. The maximum-likelihood 
(ML) decoder for a spinal code constructs a tree of possibilities by replaying the encoder over various 
possible input message bits, and computes either the Hamming distance (BSC) or squared Euclidean 
distance (AWGN) between the received data and the various choices in the tree. A complete tree is, 
of course, exponential in the message size, but our key result is that one can aggressively prune the 
decoding tree to obtain an efficient decoder with polynomial computational cost, that still essentially 
achieves capacity. 

Our approach highlights how the SAC property of the hash function provides a way to de-randomize 
Shannon's random codebook [19J approach to produce a practical, capacity- achieving rateless code. As 
such, our proof methods are likely to extend to de-randomize, and possibly render practical, various 
random coding constructions in Information Theory that have hitherto been widely used to characterize 
existential capacity results (cf. El Gamal and Kim [1]). 

1 In this paper log means "logarithm to base 2" and In stands for the natural logarithm. 
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Prior work. Raptor codes, though designed primarily for erasure channels (on which they provably 
achieve capacity), can be extended to AWGN and BSC channels with a belief propagation decoder [16] 
similar to graphical codes like LDPC [8:, |23| • However, not much is known theoretically about how good 
this code is over these channels. In fact, the capacity of LDPC codes (with an efficient decoder) over 
both the BSC and AWGN channels, in general, is still unresolved, which is further evidence that the 
BSC and AWGN channels are non-trivial settings for the design and analysis of good codes. 

Recently, an interesting "layered" approach has been developed by Erez, Trott, and Wornell [5] ( [TT] 
describes an implementation of this concept) primarily for the AWGN channel, but there is no obvious 
way to extend it to the BSC. In this approach, a layered rateless code is built upon a capacity-achieving 
fixed-rate "base" code at the lowest layer. Erez et al. prove that their code achieves capacity over 
AWGN assuming that the base code achieves capacity at some SNR and the number of layers increases 
without bound. Our work is an improvement over this layered approach in two ways: first, we resolve 
an open question they raise about designing an efficient capacity-achieving rateless code for the BSC, 
and second, it is a more direct and natural construction that does not rely on layering atop a (presumed 
capacity- achieving) fixed-rate base code. 

Structurally, spinal codes are similar to convolutional codes [5j El] , which apply a linear function 
sequentially over the message bits, but such codes with so-called small state (constraint length) are 
far from capacity (in large part because of their sequential nature). In contrast, to achieve capacity 
using linear codes (whether fixed-rate or rateless) over the BEC, prior work suggests that some form 
of random graph ensemble or expander structure is necessary [5] EJJ . Somewhat surprisingly, despite 
their sequential nature, we are able to establish that spinal codes — by using a hash function with the 
pairwise independence — achieve capacity. 

Our results. For the BSC, we show that rateless spinal codes can be encoded in 0( nI °f n ) time and 
decoded in n ^ 1 ^ 3 ^ time, where n is the number of message bits and e is the gap to capacity at which the 
code is operating (i.e., the achieved rate is within e of capacity). This result holds for n = ft(l/e 5 ). For 
the AWGN channel, we establish a similar result with somewhat lower computational cost: 0( " 1 ° g " ) 
time for encoding and n ^ 1 ^ 2 ^ time for decoding. 

Thus, by selecting n = poly(l/e), it is possible to operate within e of capacity with an encoding 
cost poly(l/e) and decoding cost exp (poly(l/e)) for both channel models. These costs are comparable 
to the computational efficiency achieved by the Forney's concatenation construction [7], as described in 
Guruswamy's survey of iterative decoding methods |12j {n/e°^ for encoding and n2 1 / £ ° (1) for decoding). 
However, the key advantage of spinal codes is that they are rateless, unlike all known good and efficient 
codes for the BSC; and they are arguably more elegant than the concatenation construction. We have 
implemented spinal codes in both software and hardware (FPGA) to demonstrate their practicality 
and high throughput, allowing us to project that a silicon implementation of the design will run at 50 
Mbits/s (commercial 802.11b/g speeds) [18] . The experimental results should alleviate concerns about 
the super-linearity of the encoder and decoder being a barrier to their practical usefulness. 

Method and proof technique. The key idea is to use the error exponents of random codes as a 
building block. We apply Gallager's result characterizing the random coding error exponent for any 
memoryless channel [10, Chapters 5, 7]. That result, though established for random codes where the 
codewords for distinct messages are mutually independent, applies even if only pairwise independence 
between the coded bits holds. The application of this idea to analyze spinal codes is somewhat remark- 
able because many coded bits of two distinct messages are likely to be highly dependent. The rest of the 
proof uses probabilistic analysis leveraging the SAC property of the hash function as a de-randomization 
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strategy, establishing that a sequentially structured code can achieve capacity. 

2 Overview of Spinal Codes 

This section describes the encoder (£ |2.1[ ) and decoder (j ]2.2[ ) for spinal codes, which are variants of the 
methods introduced in [T7]. Our discussion here is in the context of the BSC, but the same approach 
with one addition (direct coding to symbols) works for the AWGN channel, as described in £j4j 

2.1 Encoder 

The encoder maps n input message bits, m = (mi, . . . , m n ) to a stream of coded bits, xi(m), X2(m), 
These coded bits are transmitted in sequence until the receiver signals that it is done decoding. 

Hash function. The core of the code is a hash function, h, which takes two inputs, a u-bit state and 
k message bits, and maps them to a new v- bit state. That is, h : {0, 1}" x {0, l} k -> {0, 1} V . We choose 
h uniformly at random, based on a random seed, from !K, a family of hash functions with pair-wise 
independence property cf. |15j : each x G {0, 1} U is mapped uniformly at random (randomness induced 
by selection of random seed) to any of the {0, l} fc ; for any x ^ x' G {0, 1}", 

F(h(x) = y, h(x') = y') = F(h(x) = y) F(h(x') = y') = 2~ 2k , (1) 

for any y,y' £ {0, l} fc . 

Spine, h is applied sequentially to k non-overlapping message bits at a time, producing a sequence 
of u-bit states called the spine. The initial state sq = U . Let rfij = (m^i . . . m;%(j +1 )) be the i th 
k-bit block of the message m. Then, as shown in Figure [TJ each successive i^-bit value in the spine is 
generated as 

Si = h(si-\, fhi-i), 1 < i < n/k. 



Generating coded bits. The encoder uses the spine values s±, . . . , s n /p. to produce coded bits in 
passes. In the first pass, it extracts the most significant bit from each z^-bit spine value to produce n/k 
coded bits x%, ... ,x n / k . In general, in the £ th pass, the encoder extracts the £ th most significant bit 
of each spine value s%, . . . , s n /k, producing coded bits . . . , x^r . The coding parameters k, v 

determine various properties of the code. The maximum rate achieved by the code at the end of the £ th 
pass is Ri = k/l; the lowest achievable rate is kjv. 

Sequential structure of the code. The combination of the encoder's iterative structure and the 
SAC property of the hash function gives the code a unique balance. On the one hand, two messages 
that differ by one or more bits will have very different codewords, allowing analysis using random coding 
techniques. On the other hand, this divergence in the output is structured in such a way as to to allow 
an efficient decoder. 

In a spinal code, the output bits X{, Xi + a ; Xj + 2| > • • • are fully determined by the first i ■ k bits of the 
message m. Two messages that first differ in the i th block of k bits have the same first i — 1 spine values, 
and have statistically independent subsequent spine values (i.e., the later values are "very different"). 
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The hash function, h, satisfies The encoded message is divided into 
the strict avalanche criterion blocks of k bits (k=3 in this example) 

"1 I 




The spine is the sequence of outputs from the hash function. Each spine value is produced by 
applying the hash function to the previous spine value and the corresponding message block. 

Figure 1: Encoder for the BSC. Each dark square is a "1", each white square is a "0". 



2.2 Decoder 

Decoding over a tree. Maximum likelihood (ML) decoding over the BSC boils down to a search for 
the encoded message whose Hamming distance is nearest to the received message. Because the spinal 
encoder applies the hash function sequentially, input messages with a common prefix will also have a 
common spine prefix. The key to exploiting this structure is to decompose the total distance into a 
sum over spine values. If we break the received bits y into sub- vectors yi, • • • ,y n /k containing symbols 
from spine values s\, . . . , Sn/^ of the correct message, and similarly if we break x(m') for the candidate 
message m' into n/k vectors of bits xi(s' 1 ), . . . , *n/k{ s ' n /k) that depend on spine values , . . . , s' n ^ 
(corresponding to message m'), then the cost function decomposes as 

n/k 

d jf/ (y,x(m / )) = ^2d H (yi,M s 'i))- ( 2 ) 

i=l 

A summand <iff(yi,Xj(sj)) only needs to be computed once for all messages that share the same spine 
value Sj. The following algorithm takes advantage of this property. 

Ignoring hash function collisions (as established in the proof of Theorem [TJ this happens with very 
low probability), decoding can be recast as a search over a tree of message prefixes. The root of this 
decoding tree is sq, and corresponds to the zero- length message. Each node at depth d corresponds to 
a prefix of length kd bits, and is labeled with the final spine value of that prefix. Every node has 
2 k children, connected by edges e = (sd,Sd+i) representing a choice of k message bits rh e . As in the 
encoder, s^+i is h(sd, fh e ). By walking back up the tree to the root and reading k bits from each edge, 
we can find the message prefix for a given node. 

To the edge incident on node Sd, we assign a branch cost dn(yd, x-d(sd))- Summing the branch costs 
on the path from the root to a node gives the path cost of that node, equivalent to the sum in Eq.Q. 
The ML decoder finds the leaf with the lowest cost, and returns the corresponding complete message. 
The sender continues to send successive passes until the receiver signals that the message has been 
decoded correctly. The receiver stores all the symbols it receives until the message is decoded correctly. 
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Pruning the tree. Decoding along the tree has exponential complexity. A natural greedy approxi- 
mation is to prune the tree by maintaining a small number of candidates with the lowest path costs at 
each depth, while exploring the tree from root to leaves. Iteratively, at each depth, expand the retained 
(up to) B candidates into B2 k possible candidates at the next depth of the tree. Compute the path cost 
of all of these B2 k candidates and retain the B out of them with the lowest possible path cost (break 
ties arbitrarily). We use the term beam width to refer to the parameter B, as this tree exploration and 
pruning method is called beam search [22] in AI, and known as the M-algorithm pQ in the coding liter- 
ature, where it has been proposed for decoding convolutional codes. We show the somewhat surprising 
and noteworthy result that this simple greedy method essentially achieves channel capacity when used 
for spinal decoding. 

Encoding and decoding complexity. The encoder produces n/k spine values each with v bits. 
Since the cost of producing v hash bits from a t^-bit and fc-bit input is 0(v + k), the encoding cost (due 
to hash function calculations) scales as 0((v + k)n/k) = 0(n(l + v/k)). The decoder uses the pruned 
tree search over n/k depth tree with each depth requiring sorting B ■ 2 k numbers as well as B ■ 2 k hash 
operations. Therefore, the total decoding cost scales as 0(nB2 k (k + logB + u)). 



3 Performance of Spinal Codes over the BSC 

The principal result of this section is a proof of Theorem [I] (stated below) , showing the polynomial-time 
encoder and greedy tree-pruning decoder for spinal codes achieve Shannon capacity over the BSC. 

Model: Memoryless Channel in Discrete Time. A noisy channel is described by an input 
alphabet J, an output alphabet 0, and a collection of probability measures CP = (P«,i £ J), defined 
over 0: when input i £ 3 is transmitted over the channel, the received output is distributed over 
according to Pj. The communication channel is memoryless: the output of the channel at any time 
depends only on the input at that time, independent of past transmissions. That is, when xi, . . . ,xt 
are transmitted on the channel, the probability (density) that the output is yi, . . . ,yr is Ylt=i Px t (Vt)- 
The BSC is memoryless. In a BSC with bit-flip probability p G (0, 1/2), J = = {0, 1}, P (0) = 
Pi(l) = 1 -p, and P (l) = Pl(P)=p. 

Theorem 1. Consider an n-bit message encoded with a spinal code with k > 1 and v = @(k 2 logn) 
operating over a BSC with parameter p 6 (0,1/2). Then, the greedy decoder with B = n°( fc3 ) decodes 
all but the last 0(fc 3 logn) message bits successfully with probability at least 1 — 1/n 2 , achieving a rate 

/C 2 \ 

R>C-0[—y where C = 1- H{p). (3) 

The randomness in Theorem [T] is induced by the channel conditions and the code construction. For 
n = w(fe 3 logn), the theorem says that essentially all bits are decoded (to decode all the bits, we can 
append 0(k 3 logn) "tail" bits to the end of each input message). For n > k 5 , the loss of rate due to 
these tail bits is 0((k 4 logn)/n) = o(l/k). Therefore, the code achieves a rate within 0(1/ k) of the 
capacity of the BSC, making it a good rateless code. The encoder complexity scales as O(nfelogn); the 
decoder complexity scales as n°^ k3 \ 
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Proof plan. The rest of this section establishes this result with the following plan. We start by 
recalling Gallager's result on the probability of error for a random code, which requires the codewords 
associated with distinct messages to be completely independent. We present a useful variant of this 
result, which requires only pairwise independence (a property, we show to be satisfied by different enough 
messages under application of the hash function, see Proposition [5]) . We then discuss a corollary of the 
result for a code operating at a rate close to the capacity, and establish that spinal codes can operate at 
a rate near the capacity (so that the corollary will apply). Finally, we use these propositions to prove 
Theorem [Tj in two stages: first, assuming no hash function collisions, and then showing that the collision 
probability is small. 

3.1 Error probability of random codes 

The random code for a message of n bits is constructed using a distribution Q over the input symbols. 
For the BSC, the input symbols are {0, 1} and a capacity-achieving random code utilizes Q such that 
Q(0) = Q(l) = 1/2. The code maps an ra-bit message, m G {0, l} n , to a T-symbol codeword x(m) = 
(xi(m), . . . , iy(m)) by drawing each of the X((m), 1 < t < T independently at random according to 
Q. In the random code, introduced by Shannon and considered by Gallager, all x(m) are independent 
across m G {0, l} n . We consider a random code with pairwise independence across messages. 

Property 1 (Pairwise independent random code for the BSC). A code that maps every n-bit message 
m G {0, l} n to a random codeword of T bits, x(m), so that (i) for a given m, xi(m), . . . , xj'(m) are 
i.i.d. and uniformly distributed over {0, 1}, (ii) for any m 7^ m', x(m) and x(m') are independent of 
each other, and (Hi) the joint distribution of all codewords is symmetric. 

For pairwise independent random codes, the following variant of Gallager's error-exponent result [9] 
|XQ(, Theorem 5.6.1, Example 1] holds (proof in Appendix [C|) : 

Lemma 2. Consider a BSC with parameter p G (0, 1/2) and capacity C = 1 — H(p). Given a pairwise 
independent random code for the BSC of message length n, code length T, and rate R = n/T < C , let 
the decoder operate using the Maximum Likelihood (ML) rule to produce an estimate in when message 
m G {0,1}™ is transmitted. Then the probability of decoding error, P e = 2~ n ^ Ylme{o i} n IP( m 7^ m ))i 
for R = 1 — H (q) with p < q satisfies: 

(a) P e < 2~ TD (*), where D(q\\p) = qlog q - + (l- g)log^, if q < Vp/(Vp + V^p), and 

(b) P e < 2- T ( 1 -«- 21 °g(v / P+v / T=p)) ) otherwise 

3.2 Error probability at rates R close to capacity C 

Lemma 3. Consider the same setup as Lemma^ with rate R = n/T 
C = 1 — H(p), that is, q ~ p. Then, 

P e <2~ TD ^ p) ps 2" T ^ C - R)2 , 

where n~ l = ®(p(l — p) ( log ^ 1 ~ P ^ ) 2 ) ■ 

Proof. From Lemma [2J for all q close enough to p, P e < 2~ TD ^ pS) . Now, consider p fixed and let 
F(q) = D(q\\p) be function of q. Then, by Taylor's expansion of F(q) around p, 

F(q) = F(p) + F'(p)(q - p) +F"(6)(q -pf/2, (5) 



1 — H(q) close to capacity 

(4) 
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for 9 € \p,q}. Noting that F'(x) = log (ff^g) and F"{x) = x{1 _ l x)ln2 , we see that F(p) = F'(p) = 0, 
and that for q ~ p, 

For the entropy function H{x) = — xlogx — (1 — x) log(l — x), using the first-order Taylor expansion, 
we obtain that for q ~ p, 

H(q) « H(p) + log ( g - p). (7) 



Since i? = 1 - ff(qf) and C = 1 - ff(p), 



(C-i?) 



The desired claim follows from © and @. □ 

3.3 Rates achievable by spinal codes 

The following claim shows that a spinal code over the BSC can achieve rates arbitrarily close to the 
channel capacity, C, for large k. Hence, Lemma [3] is applicable. 

Claim 4. There exists L > 1 so that the rate induced by the spinal code at the end of pass L satisfies 

c-r = e( — 

V k 

Proof. Consider L such that > C > These conditions may be rewritten as 

k T k C „ k 2C 

C + 1<i£ C +2 -L <C -L^-L- 

Hence, L = 6(£), and C - R = 9(g). Together C- R = 6(^). □ 

3.4 Proof of Theorem [l] 

We now establish that by the end of pass L, chosen as above, decoding happens with high probability. 
We shall prove that if B = n°( k with high probability, for i* = G(A; 2 C _4 k~ 1 logn), when processing 
the i th spine value, all non-pruned codewords either agree with the i — i* true spine values (so there 
are less than B of them), or are less likely than the true spine (so cannot cause the true spine to be 
pruned out). As a consequence, the true spine is never pruned, so the decoder manages to decode all 
but i*k = e(k 3 C- 4 K p 1 logn) bits. 

The following proposition is an implication of the strong avalanche criterion. 

Proposition 5. Let m, m' be two messages differing in message block fhi. Let {sj} and {s'j} be the 
spines for m and m' , respectively. Then, 

F(3j e{l,---g}:s i+j = s' i+j )<g-2~ v 

Lf such a j does not exist, then all the bits of Sj+i • • • Si+ g are independent of bits of s' i+l ■ ■ ■ s' i+g , and 
each of them has a uniform independent distribution. 
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Proof. Due to the pairwise independence property of hash function, when two different inputs are 
passed through the hash function, the output bits corresponding to these input bits are independent 
of each other and each of them is distributed independently and uniformly. Therefore, the chance of 
two different inputs producing the same output is 2~ v . By the union bound, the probability of such 
an event happening over a series of g spine values is bounded by g ■ 2~ v . By iteratively applying the 
property that when spine values differ at some stage t, the bits produced at stage t + 1 are independent 
and uniformly distributed, we conclude that if all spines are different, their bits are independent and 
distributed uniformly. □ 

Proving Theorem [T] assuming no collisions. We establish Theorem [TJ assuming no hash function 
collisions. Later we show that collisions happen with low probability. We require v = Q(k 2 \ogn) with 
a large-enough constant multiplier in B(-) term. Throughout, we will assume that this m £ {0, l} n 
is a fixed choice that was transmitted. Establishing that with high probability (with respect to all 
randomness in code construction and channel noise) it gets decoded will imply all messages get decoded 
with high probability due to symmetry of the random-code and memoryless property of the BSC noise 
model (or more generally, any memoryless channel). 

Lemma 6. Consider the greedy spinal decoder operating after all coded bits of the L passes are received. 
Assuming no hash collisions, the decoder decodes all but the last 0(k 3 logn) bits correctly with probability 
l-0(l/n 4 ). 

Proof. Consider message m that was transmitted and any other message m' that differs from m in 
any of the first k bits. In the absence of hash collisions, as per Lemma [5j codewords of m and m' are 
independent of each other and each of their bits is independent and uniformly distributed over {0, 1}. 
That is, m and m' satisfy Property [TJ 

If we restrict our attention to codewords generated from the first i spine values, that is, codewords 
of length N = iL, there are 2 %k ~ k codewords, one each for a message m' that differs from m in any of 
the first k bits. As established above, the pair m and any other m' satisfies Property [TJ Using Lemmas 
[2]and[3j we obtain that the probability that any of the 2 %k ~ k messages (that differ from m in any of the 
first k bits) is more likely than the original message m is bounded above by P e (i), where (with k large 
enough for Lemma [3] to be applicable) 

Pe {i) = 2 - N ^{C-R? =2 " 4i ^#. (9) 

That is, for i* = 0(fc 2 C 1 logn) , the probability of such an error is bounded above by 1 — 1/n 6 (with 
a suitably large constant factor in the ©(•) term for i*). Therefore, after processing the first i* spines, 
the only messages that can have a higher likelihood than the original message are those that do not 
differ from m in the first k bits. There are at most 2 % k ~ k such messages and hence if B = 2 % k = n ^ k3 \ 
then the original message will not be pruned out. 

Now we apply the above argument inductively. Consider a stage j where the only messages that are 
not pruned out and have likelihood higher than the original message m are those that differ from m in 
a bit position between jk — i*k + 1 to jk. Now when the decoder moves to stage j + 1, messages that 
are not pruned out are expanded by factor 2 k . Among these, consider the messages that start differing 
from the original message in any of the k bit positions: jk — i*k + 1, . . . , jk — i*k + k. By applying the 
same argument as we did above, it follows that at the end of stage j + 1, all of these 2 l * k ~ k messages 
will have likelihood smaller than the original message with probability at least 1 — 1/n 6 . 

The above invariant together with the union bound implies that at the end of stage n/k, the original 
message is preserved in the B candidates with probability at least 1 — 0(l/n 5 ). Further, the most likely 
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2 l * k of these B candidates are those that have correct n — i*k prefix bits. That is, the decoder manages 
to decode all but last 0(i*k) = 0(k 3 log re) bits correctly. □ 



Dealing with collisions. The above proof uses the fact that in the absence of collisions, given the 
original message m of interest and any other message m' that differs from m in the first k bits, their 
corresponding codewords x = x(m) and x' = x'(m') satisfy Property [T] Therefore, the probability of 
any such message m' having likelihood higher than m is at most 0(l/n 6 ) as desired. Note that this 
is precisely the argument that is used inductively along with the union bound to establish the claim. 
Therefore, it is sufficient to establish that the effect of collision is negligible for this step only. 

We wish to show that the effect of collisions is small, using the following plan. As stated in Lemma 
[7j we will identify an event £ so that conditioned on it happening, Property [T] is satisfied as above; and, 
the probability of event £ c is (9(l/re 6 ). Using this, we will establish that the probability of any such 
message m' having likelihood higher than m continues to remain at most 0(l/re 6 ) as desired. 

Lemma 7. Let m be the i*k prefix bits of an uncoded message. Consider any other message prefix m' 
of the same length, with any of the first k bits differing from m. Then there exists event £ so that 

(a) Conditioned on event £, all pairs of messages (m, m'), satisfy Property^ 

(b) The probability of £ c is 0(l/n 6 ). 

The proof of Lemma [7] is in Appendix [Aj Using the above propositions, we complete the proof of 
Theorem [I] here. Define the event err as the one in which the likelihood of an undesirable message 
(prefix) m' is higher than original message (prefix) m. Conditioned on event £, as per Lemmata), 
Property [T] is satisfied by all relevant codeword pairs as desired in the proof of Theorem [I] in the absence 
of collisions. Therefore, conditioned on event £, and the arguments presented earlier for the no-collision 
case, it follows that P(err|£) = 0(l/n 6 ). That, together with Lemma [T^b), yields 

P(err) =P(errn£) + P(errn£ c ) < P(err|£)P(£) +P(£ C ) < P(err|£) + 0(l/n 6 ) = <3(l/n 6 ). 

This completes the proof of Theorem [TJ 

4 Performance of Spinal Codes over AWGN 

The main result of this section is that spinal codes achieve Shannon capacity over the AWGN channel 
with a polynomial-time encoder and decoder. The arguments are similar to the BSC case. 

AWGN channel model. The transmitter's primary resource is power, measured as the squared value 
of the output symbols. Typically, for regulatory and practical reasons, the average power should be < P 
for some P. If an n-bit message m = (mi, . . . , m n ) is mapped to T symbols x(m) = (xi(m), . . . , x-r(m)), 
then the power of x(m) is ^ Y2{ x l ( m )- The rate of such a code is R = n/T bits/symbol. When these 
symbols are transmitted over the AWGN channel, the receiver sees y = x + z, where noise-vector 
z = (zi, . . . ,Zt) has i.i.d. Gaussian components with mean and variance a 2 . The capacity of this 
channel is C awgn (-P) = \ log 2 (1 + SNR) bits/symbol, where SNR = denotes the signal-to- noise ratio. 

Encoder and Decoder. The procedure described in §2.1| for generating output symbols for the BSC 
is modified slightly to produce a stream of coded symbols in M.. The modified encoder generates coded 
symbols from each z^-bit spine value: in the first pass, the encoder produces n/k symbols x\, . . . ,x n /f. 
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using the c most significant bits of si, . . . , s n /fc, respectively. In the next pass, the next c most-significant 
bits are used, and so on. 

The sequence of c input bits is treated as a binary number b £ {0,...,2 C — 1}. The encoder 
computes each output symbol as Xi = <& _1 (7 + (1 — 2j)u)\/P~, where $ is the CDF of standard Gaussian, 
u = (b + l/2)/2 c , and 7 = <&(— /3). The symbols generated are in the range [— /3\/P, (3\^P], and within 
that range they are distributed like a Gaussian with mean and variance P, quantized into 2 C equally- 
probable values. When (3, c — > 00, the coded symbols will be i.i.d. Gaussian. 

The only change to the decoder is to use the squared £2 Euclidean distance instead of the Hamming 
distance in Q. The intuition is that in each case, given the channel parameters, the distance metric 
gives (up to normalization) the log likelihood that a message is correct given the observation y. 



Performance over AWGN. The following result shows that spinal codes achieve nearly optimal 
rates over the AWGN channel in a rateless manner with efficient encoding and decoding algorithms. 



Theorem 2. Consider an AWGN channel with noise variance bounded below by o~" 2 nin . Consider a 
spinal code constrained to have average power < P, with k > ^ log (l + P/o' 2 nin ). Let the code map 



message bits to coded symbols with (3,c as per (33), with e = 1/k and cr m i n in place of a in (33). Let 
v = 0(fc 2 clogn), and let the decoder operate with B = n°( k \ Then the decoder will correctly decode 
all but the last 0(k 2 logn) message bits with probability at least 1 — 1/n 2 within time T such that the 
induced rate R = n/T satisfies 

R>C 3Wgn (P)-0(l/k). (10) 



The proof involves a choice of parameters (3 = Q{^J\og k) and c = 0(| log SNR| + | log cr m i n | + log k). 
These parameters depend on o~ m in, to bound the "dynamic range" of the channel capacity. 

Proof. The highest rate at which the code can operate is k. Choose k large enough so that k > 
2 log(l + SNR max ), where SNR max = P/c r^ in . Now let e = 1/k and assign the remaining parameters as 
in Claim [IT] (in Appendix [Bj mirrors Lemma [3]) . 

'c 2 

k 



Zj still holds with C = C awgn — 1/k, so a rate R such that C — R = 
C aW gn — Q(l/k). Theorem [2] proceeds according to the same arguments as the 



The proof of Claim 
achievable. That is, R - 

proof of Theorem [TJ with Claim 11 replacing Lemma [3j to achieve (for large enough k) the bound 



P e < 2- N( - c ^-^-/k-R) = 2 -0( Af c 2 /fc) q-q 

Subsequently, ^ is replaced by 

P e (i) = 2 - @ ( iLC ' 2 / k \ (12) 

That is, i* is chosen to be 0(/cC _2 logn) rather than ©(/c 2 ^^" 4 ^" 1 logn), and now B = 2 % * k = n°( k2 \ 
rather than n°^ k3 \ Finally, v is required to be 0(/c log nc) rather than 0(/c 2 logn). □ 



5 Conclusion 

We proved that spinal codes achieve Shannon capacity for the BSC and AWGN channels with an 
efficient polynomial-time encoder and decoder; they are the first rateless codes with these properties. 
The key idea in the spinal code is the application of a hash function in a sequential manner over the 
message bits. The sequential structure of the code turns out to be crucial for efficient decoding, while 
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the pair-wise independence of the hash function provides enough pseudo-randomness to ensure that the 
code essentially achieves capacity. 

The key idea in the proof is an unusual application of a variant of Gallager's famous result char- 
acterizing the error exponent of random codes for any memoryless channel; the use of this result is 
unconventional because the spinal code is not a traditional random code. Our work provides a method- 
ical and effective way to de-randomize Shannon's random codebook construction, and as such, applies 
immediately to all discrete memoryless channels and will likely generalize to other random coding ar- 
guments in Information Theory. 
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A Proof of Lemma [7] 

Proof. To construct event £, consider the original message (prefix) m and any other message (prefix) 
m' that differs from m in any of the first k bits. Both these messages are of length i*k. Given m, 
denote all such message (prefixes) m' as M'(m) (note that |M'(m)| = 2 l * k ~ k ). 

At the end of L passes, the codewords generated based on these (prefix) messages are of length 
N = i*L. Let them be x and x' respectively. We wish to evaluate the joint probability of x = b and 
x = b' for any b, b' 6 {0, 1}^ (effectively, we are assuming a re-indexing of the coded bits so that the 
first L coded bits depend on the first spine value, the next L coded bits depends on the next spine value, 
and so on). Since the messages m and m' differ in the first k bits, by the property of the hash function 
(Proposition [5]), the v bits of the first spine values for the messages are i.i.d. uniform random bits. If the 
first spine values of the two messages differ (i.e., no collision), which happens with probability 1 — 2~ v , 
the v bits of the second spine values for the two messages are i.i.d. uniform random bits, and so on. 
Let Ej be the event that the first j spine values for both messages are not the same (i.e., no collision 
amongst first j spine values). Then P(Ej\Ej-i) = 1 — 2~ v . Therefore, for j > 1 and since Ej C Ej—x, 

P(E j )=F(E j nE j „ 1 ) = PiEjlE^xME^x) 
= (1 - 2- u )F(E j _ 1 ) 

= (1 - 2~y. (13) 
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Now, conditioned on Ej-i, the L coded bits generated from the j th spine value in x and x' are i.i.d. 
and uniformly distributed. Therefore, (with notation Xjj = (xj, . . . ,Xj), etc.) 

P(x = b,x = b') > P(x = b,x = b',^i) 

= P(x L+ i jA r = b L+liN ,-x.' L+1>N = b / i+1>A r|£;i)P(xi >Z/ = b 1>L ,x.' 1>L = h' 1L ,Ei) 

> P(x L+liJV = b L+1)N , x ' L+hN = b' i+1)jV |Ei)(p(x lii = b 1)L ,xi )L = b' liX ) -P(Ef)) 
= P(x L+liJV = bi+i.tf.x^jy = b / L+1 ^|i? 1 )(p(x 1 , L = b 1)L )P(x' 1)i = b ; 1)L ) - P(^ c )) 
= P(xl+i i at = b L+ltN ,x' L+1 ^ N = b' L+1 tN \Ei)(2~ 2L - 2~ U ^J 
= P(x L+liJV = b L+ljN ,-K' L+1>N = b / L+1 ^|i? 1 )2- 2L (l - 2~» +2L ) 

= (i _ 2-+ 2L ) p( XliL = b liL )P(x' liL = b' 1)Z ) P(x z+1)Ar = b z+1)J v,x' i+liJV = bi+^l^i). 

(14) 

Here, we have used the fact that the distribution of x^+i^, x z+1 N is conditionally independent of 
xi^jX^ L given E\. (14) then sets up a recursion, leading to the following: 

P(x = b, x' = b') > P(x = b)P(x' = b') (1 - 2- u+2L f . (15) 
Given this, with respect to the underlying probability space, Q, we can define an event £ m m / with 

P(£m,mO = (l-2^ +2L ) 1 *, (16) 

so that we have 

l { x=b,x'=b' } = P(x = b)P(x' = b')l { e m m , } + q(b, b')l {E o m m/} , (17) 

where is the indicator random variable of event E with = 1 if u E E and otherwise 

for uj G fl and •) represents the conditional probability distribution of x, x' given £ c . Equivalently, 
what we have is an event £ m)m ' with property (16) such that 

P(x = b,x = b'|£ mjm /) = P(x = b)P(x = b'). (18) 

Now define 

£ = n m / eM '(m)£m,m'- (19) 



Since £ C £ m) m' for any m' £ M'(m) and from (17), the conditional distribution of x, x' with respect 
to £m,m' is uniform, it follows that, for any m' S M'(m), 

P(x = b, x = b'|£) = P(x = b)P(x = b'). (20) 



Finally, by ( 16 ) and union bound, it follows that 

P(£ c ) = 0{i*2 N+2L - u ) (21) 

Therefore, choosing 

v = {N + 2L + \ogi*) + 61ogn = G(fc 2 logn), (22) 
with an appropriately large constantleads to 

P(£ c ) = 0(l/n 6 ), (23) 
as desired, completing the proof of Lemma [7} □ 
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B Error probability of random codes over AWGN 



As in §3.1[ we consider random codes with only pairwise independent codewords across messages. 

Property 8 (Pairwise independent random code for AWGN with distribution Q). A code that maps 
every n bit message m € {0, l} n to a random codeword of T real numbers, x(m), so that (i) for a 
given m, cci(m), . . . ,xr(m) are i.i.d. with distribution Q, (ii) for any m 7^ m', x(m) and x(m') are 
independent of each other, and (Hi) the joint distribution of all codewords is symmetric. 

This definition allows us to state the following variant of Gallager's error-exponent result [10, The- 
orem 7.3.2] for a random code on the AWGN channel. The coded symbols have distribution Q over a 
finite set £1 C R. 

Lemma 9. Consider an AWGN channel with noise variance a 2 and a pairwise independent random 
code for AWGN with distribution Q, message length n, code length T, and rate R = n/T < C . Then 
the probability of error under ML decoding is bounded by 

Pe < 2 -t( So (Q)-b) i where Eo( Q ) = _i og |_^^[^Q (i )exp(-^^)] 2 d y |. (24) 

Next, we want to specialize this bound for the spinal code symbol distribution described in §|4j Given 
c, (3 and P, let 

n = {$-i( 7 + (1 _ 2 7 )«)v / P : 7 = H-/3), u = b -^, b e {0, . . . , 2 C - 1}}. 



where is the CDF of the standard Gaussian. By construction, \fl\ = 2 C and C [— /3yP, fi\P\. The 
distribution over b is uniform, as in the case of the spinal encoder, and hence each j G fi is equally likely 
with probability 2~ c . This leads to the following result. 

Lemma 10. For the channel and code of Lemma^ with uniform distribution over Q (with parameters 
/3,c,P), the probability of error under ML decoding is bounded above as 

(>i\2 8 \ ct 2 (1 + 1/C) 2 / ln2 4cr 2 J 

Proof. Lemma [9] indicates that a random code generated with this distribution would have error prob- 
ability 

P„<2-^-«> where E= - 1 o g { 7 2=/j E 2-=e Xp (-(^£)]%}. (26) 

The expression above is explicit but opaque. We can simplify it by rewriting the summation over 
discrete jgOas an integral over x £ [—/3y/P~, pV~P] with Gaussian density, provided that we construct 
a suitable function 5(x) so that j = x + 5{x) is distributed according to ft. Extracting 5(x) from the 
resulting integrand will yield a tractable expression. 

Using the mean value theorem and the properties of the Gaussian, the separation between two 
adjacent elements in f2 can be bounded above by 



A s A( A<! ,P) = ^^ 2/2 > 



2c-l 



14 



Now consider the following thought experiment. First, sample a Gaussian variable with mean and 
variance P. If the outcome is within [—/3\/ r P, j3y/P], map it to a nearby value in Q, so that the induced 
distribution over elements of O is uniform (equiprobable quantization); if the outcome is not within 
[-Py/P,pVP\, reject it (truncation). The rejection probability, r(/3), is 2(l-$(/3)) = 0(4 exp(-/3 2 /2)). 
We can relate the quantized value j to the sampled Gaussian value x by an additive discretization error 
5(x) = j — x. From these properties, it follows that the discrete summation involving probabilities 
2~ c over f2 in (26) can be replaced by a Riemann integral over the Gaussian density with mean and 
variance P, normalized by 1/(1 — r(/3)), and limited to the range [— /3y/P , /3y/P]: 

fp 

(l-r(/3))- 1 exp(- 2p/ 



E 



log 



1 



V87T 3 P 2 (J 2 



exp 



x 2 \ 



exp 



{y-x-5{x))\ dx ^ 



4a 2 



dy}, (27) 



By construction, \5(x)\ < A. We can pull 5(x) out of the integrand by placing a multiplicative bound 
on exp(— (y — x — 5(x)) 2 /4a 2 ) in terms of exp(— (y — x) 2 /4a 2 ) and a small error term involving A. Let 
C > 1 be a large constant. Then if \y — x\ < £\5(x)\, 



exp 



(y - x - 5(x)f 
4a 2 



< exp 



(y-x) 2 ^ /(2C + 1)A 2 
exp 



4a 2 



4a 2 



Otherwise, \y — x\ > C\5(x)\ and hence 

(y-x-8(x)) 2 



exp 



4a 2 



< exp 



(y - xf 



(28) 



(29) 



From (27)-(29), it follows that (using approximation log(l 
equivalently f3 large), 



4^(1 + 1/C) 2 , 
x) ~ — x/ln2 and treating r(/3) small or 



E > 



2r((3) (2C + l)A 2 loge 



In 2 



4a 2 



log 



1 



V8^P 2 a 2 



exp 



x 2 \ 
2PJ 



exp 



(y - xf 



4a 2 (l + I/O* 



dx 



2 ^ 

dyj. 

(30) 



As established in [El Eq (7.4.21) 
log ' 



v / 87r 3 P 2 cT 2 (l-l/C) 2 
^ l0g ( 1+ a 2 (l + l/ C ) 2 



exp 



2PJ 



exp 



(y ~ 



4a 2 (l + l/C) 2 



dx 



2 1 
dy) 



Combining (30) and (31), we obtain the desired result. 



(31) 

□ 



Claim 11. With an appropriate choice of parameters, for a pairwise independent random code over 
AWGN with distribution Q, the probability of error for a rate R < C aW gn — £ is bounded as 



Pe < 2~ T(Ca ' 

Proof. For a given small enough e > 0, select 



-e-R) 



(32) 



C = 9SNR/e, with SNR = P/a 2 , 



(3 large enough so that r{(3) 



eln2 



c large enough so that A 



lea 2 



where recall r(/3) = 2(1 - < I>(/3)), 
/3/Pexp(/3 2 /2) 



9VP 



where recall A 



2 c-l 



(33) 
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This selection leads to (3 = Q^Xogl/e) and c = 6(logSNR max + log<7 rmn + logl/e) with SNR max = 
P/cJ min . Now, with these choices of parameters and using the fact that | log(l + x) is a 1-Lipschitz 



function, we obtain from (25) that 

E' > C awgn - e. (34) 

□ 

C Variation of Gallager's result: Pairwise independent random code 
and discrete memoryless channel 

Here we present a derivation of a variation of Gallager's result about the error exponent (or error 
probability) for a random code under the ML decoding rule for any discrete memoryless channel. The 
variation assumes the pairwise independence property of random codewords rather than complete in- 
dependence. Effectively, we observe that the proof technique of Gallager [10] requires only pairwise 
independence. Since results identical to Lemmas [2] and [9] were derived by specializing them for the BSC 
and AWGN channel respectively (see [10, Chapters 5, 7]), the justification of these two Lemmas follow. 

Pairwise independent random code. Consider n-bit messages in {0, l} n . Let Q be distribution 
over J. Under a pairwise random code, using Q, of rate R = n/T, each message m € {0, l} n is mapped 
to a random codeword x(m) 6 3 T such that 

(a) For any m G {0, l} n and i = (ii, . . . , ix) £ 3 T , 

T 



P(x(m) = i) = IjQ(*'t). (35) 
t=i 

(b) For any m / m' € {0, l} n and i, i' G 3 T , 

P(x(m) = i, x(m') = i') = P(x(m) = i) x P(x(m / ) = i') (36) 

(c) The joint distribution of all codewords is symmetric. 

Maximum likelihood decoding. To transmit message m, the codeword x(m) is sent over the 
channel, producing output y. The ML rule produces an estimate m so that 

P(y|x(m)) = max P(y|x(m')). (37) 

m'e{0,l} n 

A decoding error occurs if m ^ m. 

Probability of error. Let P em denote the probability of decoding error when m was transmitted. 
P em is average of probability of error over all randomly chosen codes. As before, the overall probability 
of error is 

P e=^{ E P e m ). (38) 
m6(0,l}" 

Due to symmetry in the random code, P em , the average probability of error over all choices of codes, is 
the same for all m. Therefore, P e equals P em for any given m. 
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Theorem 3. Given the above setup, for any m G {0, 1}", 

p < n-T(-pR+E (p,Q)) 
1 em * j 

/or any < p < 1 wii/i 

q) = - log [ J2 ( E o(o^ ) 1+P " • 

The best bound is achieved by optimizing for choice of p, Q. Specifically, define 



(39) 



(40) 



E (R) = max - pR + E Q (p,Q) 



(41) 



Then, Theorem [3] implies the bound P em < 2~ NE °( Ii \ This bound when specialized to the BSC and 
AWGN channel (with proper choice of p, Q in (41)) results in Lemmas [2] and [9] (see [101 Chapters 5, 7] 
for details). 

Proof of Theorem^ The proof is essentially identical to that in [9], presented here for completeness. 
Consider a message m G {0, 1}™. Then, 



where 

My) = I 

This can be upper bounded as 



(m/m) = P(y|x(m))0 m (y), 

yeO T 



1 if P(y|x(m)) < P(y|x(m')) for some m' 7^ m, 
otherwise. 



(42) 



(43) 



y)< 



E m y m ny|x(m'))i +P 

P(y|x(m))i^ 



, p>0. 



(44) 



From (42) and (44), we obtain 

/ m) < ^2 P(y|x(m)) 1 ^ [ ^ P(y|x(m / )) 1 ^ 



m 



, p>Q. 



(45) 



Now recalling that it is a pairwise independent random code and averaging both sides with respect to 
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this random code, we obtain for < p < 1, 
P em =E[P(m/ffi)] 

<e[ E P(y|x(m))w?[ E F > (y|x(m / )) 1 ^ 



E E^P(y|x(m))^ [ ^ IP(y|x(m / )) 1 ^ 



2 E x(m) [p(y|x(m))i+pE[[ E P(y|x(m')) ^] P |x( m ) 



(a) 



< 2 E x(m) [p(y|x(m))i+p (e[[ J2 P(y|x(m / )) 1 +p]|x( 

E E x(m) [P(y|x(m)) 
yeO^ 



in ) 



i 



(b) 



E E *(™) 



E E^P(y|x(m / )) 1 +"|x(m) 
(y|x(m)p r\ ( J2 E[p(y|x(m'))^ 



(46) 



Here, we use the notation E x ( m ) to explicitly note that the randomness is with respect to x(m); (a) 
follows from Jensen's inequality for conditional expectation and fact that f{x) = x p is a concave 
function for < p < 1; and (b) follows from the pairwise independence of x(m) and x(m') for any 
pair of messages m ^ m'. Now due to symmetry of the random coding distribution, it follows that 
E 



P(y|x(m / )) n-p is the same for all m' (including m) and equals 

E[p(y|x(m / ))^] = ^Q T (i)P(y|i)^p. 



where Q T (\) = \[ t=1 Q(i t ). Therefore, from @ and the fact that n = RT, we have 

p«<^ E [Y,Q T ®ny\^] 1+p - 

y eo T i& T 

Now using the property of memoryless channels and random codes, we have that 

T 

Q T (i)P(y|i)^ = HQ(i t )¥(y t \i t )^ . 
t=i 



(47) 



(48) 



(49) 



Using this product-from in (48) and exchanging sums and products, we have 

T 



Pern < 2^ T II E [E^W^W^ 

t=ly t eO it& 

( ^[E(E»i 



l+p 



ii+p 



(J 2P rt 2 -te { p ,Q) 

= 2 -T(-pR+E (p,Q))_ ( 5 q) 

Here, (a) uses the definitions of random code and memoryless channel, and (b) follows from the definition 
of E Q (p,Q). □ 
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